Co-trained Ensemble Models for Weakly Supervised Cyberbullying Detection

نویسندگان

  • Elaheh Raisi
  • Bert Huang
چکیده

Social media has become an inevitable part of individuals’ social and business lives. Its benefits come with various negative consequences. One major concern is the prevalence of detrimental online behavior on social media, such as online harassment and cyberbullying. In this study, we aim to address the computational challenges associated with harassment detection in social media by developing a machine-learning framework with three distinguishing characteristics. (1) It uses minimal supervision in the form of expert-provided key phrases that are indicative of bullying or non-bullying. (2) It detects harassment with an ensemble of two learners that co-train one another; One learner examines the language content in the message, and the other learner considers the social structure. (3) It incorporates distributed word and graph-node representations by training nonlinear deep models. The model is trained by optimizing an objective function that balances a co-training loss with a weak-supervision loss. We evaluate the effectiveness of our approach using post-hoc, crowdsourced annotation of Twitter data, finding that our deep ensembles outperform previous non-deep methods for weakly supervised harassment detection. We also evaluate on a new benchmark to measure the sensitivity of the detectors to language describing particular social groups.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Pre-Trained Ensemble Model for Breast Cancer Grade Detection Based on Small Datasets

Background and Purpose: Nowadays, breast cancer is reported as one of the most common cancers amongst women. Early detection of the cancer type is essential to aid in informing subsequent treatments. The newest proposed breast cancer detectors are based on deep learning. Most of these works focus on large-datasets and are not developed for small datasets. Although the large datasets might lead ...

متن کامل

Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies

Cyberbullying is becoming a major concern in online environments with troubling consequences. However, most of the technical studies have focused on the detection of cyberbullying through identifying harassing comments rather than preventing the incidents by detecting the bullies. In this work we study the automatic detection of bully users on YouTube. We compare three types of automatic detect...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Histopathology Using Only Global Labels: a Weakly-supervised Approach

Analysis of histopathology slides is a critical step for many diagnoses, and in particular in oncology where it defines the gold standard. In the case of digital histopathological analysis, highly trained pathologists must review vast wholeslide-images of extreme digital resolution (100, 000 pixels) across multiple zoom levels in order to locate abnormal regions of cells, or in some cases singl...

متن کامل

Classification and Disease Localization in Histopathology Using Only Global Labels: A Weakly-Supervised Approach

Analysis of histopathology slides is a critical step for many diagnoses, and in particular in oncology where it defines the gold standard. In the case of digital histopathological analysis, highly trained pathologists must review vast wholeslide-images of extreme digital resolution (100, 000 pixels) across multiple zoom levels in order to locate abnormal regions of cells, or in some cases singl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017